library(tidyverse)
library(readr)
library(scales)

library(ggplot2)
library(gt)
library(gtable)
library(DT)

Insight in the economy of an Massively Multiplayer Online Role-Playing Game using the example of Guild Wars 2

Using the data science lifecycle of plan, data, model and deployment, I want to provide insights into many aspects of data science, programming in R and the world of the online game Guild Wars 2.

data science lifecycle

Intro

Malcolm Forbes once said, “Money isn’t everything as long as you have enough of it”. Sure, money doesn’t buy happiness, but it can buy a lot of things. This is not only true for the real world, but especially in online games, where micro transactions and payed services have become the new normal.

More and more games give you the choice to invest a lot of time or a lot of money. Both ways you’ll somehow achieve the goal of the game faster. In some games you can even exchange digital goods for real money.

To get to the point, having digital currency in games can save you time, stress and even real money.

There are multiple ways to earn money in games. Often you’ll get on with just playing the game and getting some money from enemies, but for buying the good stuff, one needs a lot more money.

One way may be farming, which means hunting special items in high amounts and selling them to other players. Here’s the question, which items are worth collecting and can be sold for which prices?

Another way is buying items from other players and reselling them with a higher price tag, just like in the real world. Many games have auction houses where those transactions can be made. For this type of income one needs to know what to buy when, which prices are low or high, and when to sell for which amount.

For this project my example of game will be Guild Wars 2. Guild Wars 2 was published in 2012 by NCSoft. There are no monthly fees and the basic game is free to play since 2015. It is a massively multiplayer online role-playing game, meaning a lot of people playing in parallel in an online world. There are over 20.000 items that can be collected and many of them can be sold and bought at the auction house, called the trading post.

trading post

This is a picture of the auction house. There are many different types of items (Armor, Weapons, …)

When selling items, 5% of the cost is a fee for the auction house. When buying items, also 5% of the price goes to the bank. Because of that it’s important to find items where the buy-to-sell price ratio is the highest.

Buying and selling items this way is called flipping items in Guild Wars speech.

To make one thing clear, there are already a lot of web pages offering this or similar kind of services like GW2Efficiency or GW2TP. This should only be a fun project exploring the R language and data science in general.

Identify variables

For making the most out of our data, we can try to identify some important variables.

  1. For buying low and selling high, we need to know everything about the prices like average, mean and quantiles.
  2. Where is the highest difference between buy and sell prices?
  3. Are there any outliers that are must-buys?
  4. What is the quantity of items on the market?
  5. Are there any special times when people buy/sell?
  6. Investing in one item, what is the return of investment (ROI)?

Metrics

Currently I’m really inexperienced with the auction market. I just buy the items I need.

The goal is to earn at least 18 gold and 6 silver just by buying/selling items that the exploration/model rates as highest ROI. This is the current values of 1$.

Getting started

Acquire data

Guild Wars API

The data we need can be fetched from a REST API at https://api.guildwars2.com. There is also a good documentation on how to use the API.

We’ll use three endpoint here: 1. Get all items there are in the game

For each item:

2.Get selling and buying listings from auction house api 3. Get basic information about the item like name, rarity etc.

At the end we can merge all this data to get a good overview of which items are sold/bought at the auction house. When joining all this together, we get a huge dataset of: - nearly 30.000 items - around 500.000 buy auctions (orders) - over 4.300.000 sell auctions (offers).

There are also APIs where authorization is needed to get information about one’s user profile, but we won’t need this data for this analysis.

Data Overview

Load the data from CSV. We don’t want to get that amount of data from the API again and again.

dir <- getwd()
item_list_buys <- read.csv(paste(dir, "/gw2-all-buys.csv", sep = ""))
item_list_sells <- read.csv(paste(dir, "/gw2-all-sells.csv", sep = ""))

To get a quick impressions of the data, let’s have a look at the rough numbers.

This is how the buyer data looks like:

glimpse(item_list_buys)
## Rows: 467,406
## Columns: 10
## $ name         <chr> "Sealed Package of Snowballs", "Sealed Package of Snowbal~
## $ description  <chr> "Open this package to create several snowballs that can h~
## $ type         <chr> "Consumable", "Consumable", "Consumable", "Consumable", "~
## $ rarity       <chr> "Basic", "Basic", "Basic", "Basic", "Basic", "Basic", "Ba~
## $ vendor_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ id           <int> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 2~
## $ icon         <chr> "https://render.guildwars2.com/file/1D05D1EE04E16E69710E1~
## $ listings     <int> 1, 1, 4, 7, 5, 1, 3, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ unit_price   <int> 85, 81, 80, 77, 76, 75, 74, 72, 68, 67, 66, 65, 64, 63, 6~
## $ quantity     <int> 169, 62, 1000, 1578, 1024, 250, 630, 2365, 250, 250, 38, ~

The seller data has the same structure, just for all the offers:

glimpse(item_list_sells)
## Rows: 4,231,248
## Columns: 10
## $ name         <chr> "Sealed Package of Snowballs", "Sealed Package of Snowbal~
## $ description  <chr> "Open this package to create several snowballs that can h~
## $ type         <chr> "Consumable", "Consumable", "Consumable", "Consumable", "~
## $ rarity       <chr> "Basic", "Basic", "Basic", "Basic", "Basic", "Basic", "Ba~
## $ vendor_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ id           <int> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 2~
## $ icon         <chr> "https://render.guildwars2.com/file/1D05D1EE04E16E69710E1~
## $ listings     <int> 1, 1, 2, 3, 6, 9, 4, 4, 5, 2, 3, 6, 5, 11, 41, 5, 8, 8, 6~
## $ unit_price   <int> 275, 360, 372, 373, 374, 375, 376, 377, 378, 379, 380, 38~
## $ quantity     <int> 17, 55, 18, 137, 9, 263, 4, 8, 7, 5, 5, 365, 20, 386, 292~
  • Name: This is the name of the item.
  • Description: Some items also have a description giving further information.
  • Type: can be something like armor, consumable, crafting material and so on
  • Rarity: There are 8 types of rarity from very common to very rare.
  • Vendor_value: What a sell to a NPC dealer would yield.
  • Id: Unique ID of the item.
  • Icon: URL to a picture of the item
  • Listings: How many players are offering this item to this price
  • Unit_Price: What the player wants for this item (in copper)
  • Quantity: How many offers of this items there are for this price.

Listings, Unit_Price and Quantity can be read together.

E.g. for the first data entry:

  • There is 1 (listings) player
  • offering 17 (quantity) pieces of this item
  • for 168 (unit_price) copper each.

We’ll learn more about copper shortly.

Compare total items of all buys and sells:

nrow(item_list_buys)
## [1] 467406
nrow(item_list_sells)
## [1] 4231248

Compare unique items of buys and sells:

item_list_buys %>% 
  select(name) %>% 
  drop_na() %>% 
  unique() %>% 
  count() %>% 
  first()
## [1] 18931
item_list_sells %>% 
  select(name) %>% 
  drop_na() %>% 
  unique() %>% 
  count() %>% 
  first()
## [1] 17896

Currency

coin

Just like in the real world, where 100 cents are 1 Euro, the main currency in Guild Wars, coins, are separated into three units.

There is copper, silver and gold. All those values are the same and are automatically calculated by the game.

10.000 copper == 100 silver == 1 gold.

So we need to divide all values by 10.000 to display the price in gold, which is good for expensive goods and when we are searching for the upper prices.

There may be occasions where it makes more sense to look for silver or even copper.

Filter for most expensive goods and display them:

9999 gold seems to be the maximum possible price of an auction. Our top item here, the Festive Grymm Svaard, is a cosmetic item that can only be bought with real money and lot of luck.

Well this is only an offer, so we’ll have to find out if anyone ever bought this item to this price.

item_list_sells_group <- item_list_sells %>% 
  group_by(name) %>% 
  summarise(unit_price = mean(unit_price) / 10000, 
            vendor_value = mean(vendor_value) / 10000,
            type = unique(type),
            rarity = unique(rarity)) %>% 
  arrange(desc(unit_price))

top5 <- top_n(ungroup(item_list_sells_group), 5, unit_price)

top5 %>% 
  ggplot() +
  geom_bar(aes(x = name, y = unit_price, fill = rarity), stat="identity") +
  geom_point(aes(x = name, y = vendor_value), fill = "#000000", stat="identity") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_x_discrete(limits = top5$name)

As we can see, the highest order on this item is 300 gold. That’s pretty much, but far away from 9999.

But who doesn’t want to look that fancy. Cordyceps Festive Grymm Svaard

item_list_buys %>% 
  filter(name == 'Festive Grymm Svaard') %>% 
  mutate(unit_price_gold = unit_price / 10000) %>%
  select(name, unit_price_gold) %>% 
  top_n(5) %>% 
  arrange(desc((unit_price_gold))) %>% 
  gt(rowname_col = "name") %>%
  tab_header(title = "Orders of the Festive Grymm Svaard") %>%
  fmt_number(
    columns = unit_price_gold,
    suffixing = "G"
  )
## Selecting by unit_price_gold
Orders of the Festive Grymm Svaard
unit_price_gold
Festive Grymm Svaard 300.70
Festive Grymm Svaard 300.70
Festive Grymm Svaard 276.47
Festive Grymm Svaard 76.47
Festive Grymm Svaard 76.47

Vendor prices are so low (copper range), that they appear as nearly zero in this chart.

If possible, it’s always better to sell items on the action house instead of the vendor.

We’ll come back to items and their value in much more detail later.

The rarity of items

There are different rarities in Guild Wars that suggest how often this item will occur in the game:

rarities <- 
  c("Junk", "Basic", "Fine", "Masterwork", "Rare", "Exotic", "Ascended", "Legendary")

colors <- 
  c("Junk" = "#a3aeb9",
    "Basic" = "#000000",
    "Fine" = "#6a9cd9", 
    "Masterwork" = "#17911a", 
    "Rare" = "#f0cd25",
    "Exotic" = "#e9ad1a",
    "Ascended" = "#e64680",
    "Legendary" = "#53238a")

df <- data.frame(rarities)

df %>% 
  ggplot() +
  geom_bar(aes(x = rarities, fill = rarities)) +
  scale_x_discrete(limits = rarities) +
  scale_fill_manual("Rarity", values = colors) + 
  theme(axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        axis.title.x=element_blank())

What is obvious by the names, rarity on the left is in general more common than rarity on the right.

One important information is that a lot of very rare items (Ascended and Legendary) can’t be sold to other players. That’s the reason we see much more Rare and Exotic items being sold for high prices than those really rare ones. Junk is zero because it can only be sold at non-player vendors.

In our dataset, we have the following distribution:

df <- item_list_sells %>% 
  count(rarity) %>% 
  arrange(desc(n))

df %>% 
  ggplot() +
  geom_bar(aes(x = rarity, y = n, fill = rarity), stat='identity') +
  scale_x_discrete(limits = rarities) +
  scale_fill_manual("Rarity", values = colors)

Types

There are a lot of different types of items. As we can see, Armor, Consumables and Weapons are the most common items.

Here we only see the distribution of items in the game. There are no numbers about sells yet.

This means there are very many different weapon, consumable and armor items in the game, but not that many different crafting materials and upgrade components which seems reasonable.

df <- item_list_sells %>% 
  distinct(type, name) %>%
  group_by(type) %>% 
  count(type)

df %>% 
  ggplot() +
  geom_bar(aes(x = type, y = n, fill = type), stat='identity') +
  theme(axis.text.x = element_text(angle = 45, hjust=1))

We can see some funny types like Quux and Qux, which is used among developers for arbitrary variables. As we can see, there are only 23 items of this type (that can be sold) and have special use-cases in the game.

item_list_sells %>%
 filter(type %in% c("Quux", "Qux")) %>%
  group_by(type, rarity) %>% 
  summarise(name = unique(name),
            mean = mean(unit_price / 1000),
            icon = unique(web_image(icon, height = 50))) %>% 
  arrange(desc(mean)) %>%
  gt(rowname_col = "name") %>%
  tab_header(title = "Quux and Qux", subtitle = "Items with special means") %>%
  fmt_number(
    columns = mean,
    suffixing = "G"
  ) %>% 
  fmt_markdown(
    columns = icon
  ) %>% 
  summary_rows(
    columns = mean,
    fns = list(TOTAL = "sum"),
    formatter = fmt_number
  ) %>%
   tab_footnote(
    footnote = "Prices in gold",
    locations = cells_column_labels(columns = mean)
  ) %>%
  tab_source_note(
    "Based on data from api.guildwars2.com"
  ) %>% 
  tab_options(
    summary_row.background.color = "#ACEACE",
    row_group.background.color = "#FFEFDB",
    table.layout = "auto"
  )
Quux and Qux
Items with special means
mean1 icon
Based on data from api.guildwars2.com
1 Prices in gold

Outliers on Crafting Material

Let’s have a closer look on one of the most important resources in the game: Crafting material.

The base material can be collected everywhere in the world from trees, stones or herbs. As a player, you can learn up to two professions that can improve those base materials to something better and useful.

Let’s first check if there are any outlers in the data that we should clean.

mean_price_crafting_sells <- item_list_sells %>% 
  filter(type == "CraftingMaterial") %>%
  mutate(unit_price_gold = unit_price / 10000)

mean_price_crafting_buys <- item_list_buys %>% 
  filter(type == "CraftingMaterial") %>%
  mutate(unit_price_gold = unit_price / 10000)

ggplot() +  
  geom_boxplot(data = mean_price_crafting_sells, aes(x = 'Sells', y = unit_price_gold)) +
  geom_boxplot(data = mean_price_crafting_buys, aes(x = 'Buys', y = unit_price_gold)) +
  geom_hline(yintercept = 999, linetype="dashed", color = "red") +
  scale_y_continuous(labels = comma)
## Warning: Removed 42 rows containing non-finite values (stat_boxplot).

We can see that on the buys, there aren’t any outliers because the human brain is still working and no one buys overpriced goods. On the other hand, on the sells there are some people who are exaggerating with the prices. Let’s strip everything at 1000 and above.

df_crafting_material <-  item_list_sells %>% 
  filter(type == "CraftingMaterial",
         unit_price < 1000) %>% 
  group_by(rarity, name) %>% 
   summarize(
    mean_price = mean(unit_price / 10000),
    sum_quantity = sum(as.double(listings) * as.double(quantity)),
    picture = web_image(unique(icon), height = 25)
  ) %>% 
  mutate(
    quantity_label = label_number_si(accuracy=0.1)(sum_quantity),
    mean_price_gold = round(mean_price, digits = 3)) %>% 
  arrange(desc(sum_quantity)) %>% 
  select(rarity, name, mean_price_gold, quantity_label, sum_quantity, picture)

df_crafting_material %>% 
  datatable(extensions = c('ColReorder', 'Buttons', 'Responsive', 'Scroller'),
            options = list(colReorder = TRUE,
                           dom = 'Bfrtip', 
                           buttons = list('copy', list(
                                           extend = 'collection',
                                           buttons = c('csv', 'excel', 'pdf'), 
                                           text = 'Download'))),
            escape = FALSE,
            colnames=c("#", "Rarity", "Name", "Mean price in gold", "Quantity (short)", "Quantity", "Icon")
            )
exp_sold_item_name <- df_crafting_material %>% 
  arrange(desc(mean_price_gold)) %>% 
  ungroup() %>%
  select(name) %>% 
  first() %>% 
  first()

Looking at Crafting Material, we first see that there is an overwhelming amount of offers on the market. There are 3.8T items of Nougat Center! Nougat Center, Chattering Skull and Plastic Fangs are all Halloween Event items and therefore not hard to get but also not very useful.

The first real crafting material is Large Claw, with still over 506.6B items.

The most expensive crafting material is Jute Breeches Panel.

unit_price_sells <- item_list_sells %>% 
  filter(name == exp_sold_item_name) %>% 
  mutate(unit_price_gold = unit_price / 10000)

unit_price_buys <- item_list_buys %>% 
  filter(name == exp_sold_item_name) %>% 
  mutate(unit_price_gold = unit_price / 10000)

available <- unit_price_sells %>% 
  mutate(amount = listings * quantity) %>% 
  summarise(sum = sum(amount)) %>% 
  first()

requested <- unit_price_buys %>% 
  mutate(amount = listings * quantity) %>% 
  summarise(sum = sum(amount)) %>% 
  first()

cost_sell <- unit_price_sells %>% 
  select(unit_price_gold) %>% 
  first() %>%
  first() * 0.95

cost_buy <- unit_price_buys %>% select(unit_price_gold) %>% 
  first() %>%
  first() * 0.95

win <- cost_sell - cost_buy  

profit <- win * available
profit_possible <- win * requested
init_costs <- available * cost_buy

When buying and selling the item “Jute Breeches Panel” in category CraftingMaterial with the highest mean price on sell, the earning is 0.07239 per unit.

There are currently 403496 requested items and 3123 available.

This may be a profit of 226.07397 gold (after taxes). This would initially cost us 67.64418 gold.

If we would have more of this item, we could even make 29209.08 gold.

Search for outliers:

# render boxplot with outliers
item_list_sells %>% 
ggplot(aes(x = factor(rarity), y = unit_price, fill = factor(type))) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## Warning: Removed 388 rows containing non-finite values (stat_boxplot).